Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations
نویسندگان
چکیده
This paper presents a case study about the applicability and usage of non blocking collective operations. These operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. We introduce our NBC library, a portable low-overhead implementation of non blocking collectives on top of MPI-1. We demonstrate the easy usage of the NBC library with the optimization of a conjugate gradient solver with only minor changes to the traditional parallel implementation of the program. The optimized solver runs up to 34% faster and is able to overlap most of the communication. We show that there is, due to the overlap, no performance difference between Gigabit Ethernet and InfiniBand for our calculation.
منابع مشابه
Parallel scaling of Teter’s minimization for Ab Initio calculations
We propose a parallelization scheme for the conjugate gradient method by Teter et. al. and report a detailed analysis of its scalability. We use MPI collective operations exclusively to take advantage of optimized collective implementations with possible hardware support. Our parallel conjugate gradient calculation can be applied in addition to the already implemented parallelism in the applica...
متن کاملMPI collectives at scale
Collective operations improve the performance and reduce code complexity of many applications parallelized with the messagepassing interface (MPI) paradigm. In this article, we will investigate the impact of load imbalance on the performance of collective operations and possibility for hiding parallel overhead caused by a collective communication pattern, by overlapping the communication with c...
متن کاملTowards Automatic Support of Parallel Sparse
In this paper, we present a generic matrix class in Java and a runtime environment with continuous compilations aiming to support automatic parallelization of sparse computations on distributed environments. Our package comes with a collection of matrix classes including operators of dense matrix, sparse matrix, and parallel matrix on distributed memory environments. In our environment, a progr...
متن کاملA Case for Standard Non-blocking Collective Operations
In this paper we make the case for adding standard nonblocking collective operations to the MPI standard. The non-blocking point-to-point and blocking collective operations currently defined by MPI provide important performance and abstraction benefits. To allow these benefits to be simultaneously realized, we present an application programming interface for non-blocking collective operations i...
متن کاملComparison the Sensitivity Analysis and Conjugate Gradient algorithms for Optimization of Opening and Closing Angles of Valves to Reduce Fuel Consumption in XU7/L3 Engine
In this study it has been tried, to compare results and convergence rate of sensitivity analysis and conjugate gradient algorithms to reduce fuel consumption and increasing engine performance by optimizing the timing of opening and closing valves in XU7/L3 engine. In this study, considering the strength and accuracy of simulation GT-POWER software in researches on the internal combustion engine...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Parallel Computing
دوره 33 شماره
صفحات -
تاریخ انتشار 2006